c:\programdata\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.utils.testing module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.
  warnings.warn(message, FutureWarning)

Main Datasets (w/ hospitalised data)

Source: https://covidtracking.com/ Source: https://github.com/CSSEGISandData/COVID-19 Various state data, third party data, and various federal data

Combine, validate, and verify data sets.

# see what filtered dataframe looks like
all_cases.head(50)
date state positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently recovered dataQualityGrade ... totalTestsViral positiveTestsViral negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade
0 2020-06-28 AK 883.0 348.0 12.0 NaN NaN 1.0 521.0 A ... 108300.0 NaN NaN NaN 0 0 0 0 0 NaN
1 2020-06-28 AL 35441.0 15656.0 655.0 2703.0 NaN NaN 18866.0 B ... NaN NaN NaN 34964.0 0 0 0 0 0 NaN
2 2020-06-28 AR 19310.0 5781.0 278.0 1373.0 NaN 63.0 13270.0 A ... NaN NaN NaN 19310.0 0 0 0 0 0 NaN
4 2020-06-28 AZ 73908.0 63394.0 2691.0 4617.0 666.0 475.0 8926.0 A+ ... 509485.0 NaN NaN 73497.0 0 0 0 0 0 NaN
5 2020-06-28 CA 211243.0 NaN 5956.0 NaN 1602.0 NaN NaN B ... 3955952.0 NaN NaN 211243.0 0 0 0 0 0 NaN
6 2020-06-28 CO 32022.0 25906.0 235.0 5399.0 NaN NaN 4442.0 A ... NaN NaN NaN 29194.0 0 0 0 0 0 NaN
7 2020-06-28 CT 46303.0 33934.0 103.0 10268.0 NaN NaN 8053.0 B ... 436644.0 NaN NaN 44324.0 0 0 0 0 0 NaN
8 2020-06-28 DC 10248.0 8499.0 126.0 NaN 34.0 27.0 1199.0 A+ ... NaN NaN NaN 10248.0 0 0 0 0 0 NaN
9 2020-06-28 DE 11226.0 4054.0 78.0 NaN 14.0 NaN 6665.0 A+ ... NaN NaN NaN 10162.0 0 0 0 0 0 NaN
10 2020-06-28 FL 141075.0 NaN NaN 14540.0 NaN NaN NaN A ... 2256314.0 182100.0 2070179.0 141075.0 0 0 0 0 0 NaN
11 2020-06-28 GA 77210.0 NaN 1236.0 10711.0 NaN NaN NaN A ... 806938.0 70881.0 736057.0 77210.0 0 0 0 0 0 NaN
13 2020-06-28 HI 872.0 140.0 NaN 110.0 NaN NaN 714.0 D ... 87882.0 872.0 87010.0 872.0 0 0 0 0 0 NaN
14 2020-06-28 IA 28489.0 10164.0 118.0 NaN 36.0 18.0 17620.0 A+ ... NaN NaN NaN 28489.0 0 0 0 0 0 NaN
15 2020-06-28 ID 5319.0 1330.0 NaN 312.0 NaN NaN 3898.0 A ... 85816.0 NaN NaN 4790.0 0 0 0 0 0 NaN
16 2020-06-28 IL 142776.0 NaN 1464.0 NaN 373.0 193.0 NaN A ... 1544978.0 NaN NaN 141723.0 0 0 0 0 0 NaN
17 2020-06-28 IN 44930.0 8376.0 617.0 7003.0 266.0 86.0 33935.0 A+ ... NaN NaN NaN 44930.0 0 0 0 0 0 NaN
18 2020-06-28 KS 13538.0 12495.0 NaN 1128.0 NaN NaN 779.0 A ... NaN NaN NaN 13538.0 0 0 0 0 0 NaN
19 2020-06-28 KY 15232.0 10944.0 386.0 2590.0 68.0 NaN 3730.0 B ... NaN NaN NaN 14732.0 0 0 0 0 0 NaN
20 2020-06-28 LA 56236.0 13245.0 715.0 NaN NaN 76.0 39792.0 B ... NaN NaN NaN 56236.0 0 0 0 0 0 NaN
21 2020-06-28 MA 108667.0 NaN 748.0 11319.0 134.0 81.0 NaN A+ ... 1048942.0 NaN NaN 103539.0 0 0 0 0 0 NaN
22 2020-06-28 MD 66777.0 58633.0 446.0 10793.0 158.0 NaN 4976.0 A ... 631490.0 NaN NaN 66777.0 0 0 0 0 0 NaN
23 2020-06-28 ME 3191.0 510.0 31.0 346.0 10.0 4.0 2577.0 A ... 93142.0 3884.0 89123.0 2838.0 0 0 0 0 0 NaN
24 2020-06-28 MI 69946.0 12689.0 557.0 NaN 193.0 106.0 51099.0 A+ ... 1033820.0 87087.0 946733.0 63261.0 0 0 0 0 0 NaN
25 2020-06-28 MN 35549.0 3280.0 288.0 4010.0 143.0 NaN 30809.0 A ... 585417.0 NaN NaN 35549.0 0 0 0 0 0 NaN
26 2020-06-28 MO 20575.0 NaN 412.0 NaN NaN NaN NaN B ... 424214.0 23527.0 399926.0 20575.0 0 0 0 0 0 NaN
28 2020-06-28 MS 25892.0 7611.0 676.0 3102.0 149.0 88.0 17242.0 A ... 280020.0 NaN NaN 25724.0 0 0 0 0 0 NaN
29 2020-06-28 MT 863.0 237.0 11.0 97.0 NaN NaN 604.0 C ... NaN NaN NaN 863.0 0 0 0 0 0 NaN
30 2020-06-28 NC 62142.0 23899.0 890.0 NaN NaN NaN 36921.0 A ... NaN NaN NaN 62142.0 0 0 0 0 0 NaN
31 2020-06-28 ND 3495.0 268.0 24.0 226.0 NaN NaN 3139.0 D ... 177229.0 NaN NaN 3495.0 0 0 0 0 0 NaN
32 2020-06-28 NE 18775.0 5455.0 123.0 1315.0 NaN NaN 13053.0 B ... NaN NaN NaN 18775.0 0 0 0 0 0 NaN
33 2020-06-28 NH 5717.0 949.0 35.0 562.0 NaN NaN 4401.0 B ... NaN NaN NaN 5717.0 0 0 0 0 0 NaN
34 2020-06-28 NJ 171182.0 126115.0 1014.0 19841.0 223.0 187.0 30092.0 A+ ... NaN NaN NaN 171182.0 0 0 0 0 0 NaN
35 2020-06-28 NM 11619.0 5877.0 122.0 1851.0 NaN NaN 5251.0 B ... NaN NaN NaN 11619.0 0 0 0 0 0 NaN
36 2020-06-28 NV 17160.0 15976.0 511.0 NaN 122.0 59.0 684.0 A+ ... 307131.0 NaN NaN 17160.0 0 0 0 0 0 NaN
37 2020-06-28 NY 392539.0 297694.0 869.0 89995.0 229.0 167.0 70010.0 A ... NaN NaN NaN 392539.0 0 0 0 0 0 NaN
38 2020-06-28 OH 50309.0 NaN 661.0 7681.0 182.0 101.0 NaN B ... NaN NaN NaN 46790.0 0 0 0 0 0 NaN
39 2020-06-28 OK 12994.0 3212.0 329.0 1456.0 134.0 NaN 9397.0 A+ ... 327683.0 13941.0 313021.0 12642.0 0 0 0 0 0 NaN
40 2020-06-28 OR 8341.0 5490.0 149.0 1022.0 53.0 35.0 2649.0 A+ ... NaN NaN 223317.0 7521.0 0 0 0 0 0 NaN
41 2020-06-28 PA 85496.0 12231.0 648.0 NaN NaN 121.0 66686.0 A+ ... NaN NaN NaN 81956.0 0 0 0 0 0 NaN
43 2020-06-28 RI 16661.0 14134.0 91.0 1984.0 16.0 15.0 1600.0 A+ ... NaN NaN NaN 16661.0 0 0 0 0 0 NaN
44 2020-06-28 SC 33320.0 19148.0 954.0 2622.0 NaN NaN 13456.0 A ... 359703.0 42618.0 317085.0 33221.0 0 0 0 0 0 NaN
45 2020-06-28 SD 6681.0 838.0 75.0 652.0 NaN NaN 5752.0 B ... NaN NaN NaN 6681.0 0 0 0 0 0 NaN
46 2020-06-28 TN 40172.0 13429.0 484.0 2564.0 NaN NaN 26159.0 B ... 748229.0 46468.0 701761.0 39848.0 0 0 0 0 0 NaN
47 2020-06-28 TX 148728.0 66361.0 5497.0 NaN NaN NaN 79974.0 B ... 1775219.0 NaN NaN NaN 0 0 0 0 0 NaN
48 2020-06-28 UT 21100.0 9002.0 289.0 1396.0 83.0 NaN 11931.0 A+ ... NaN NaN NaN 21100.0 0 0 0 0 0 NaN
49 2020-06-28 VA 61736.0 51999.0 818.0 8823.0 235.0 107.0 8005.0 A+ ... 625663.0 NaN NaN 59071.0 0 0 0 0 0 NaN
51 2020-06-28 VT 1202.0 200.0 15.0 NaN NaN NaN 946.0 B ... NaN NaN NaN 1202.0 0 0 0 0 0 NaN
52 2020-06-28 WA 31404.0 NaN 304.0 4240.0 NaN 58.0 NaN B ... NaN NaN NaN 31404.0 0 0 0 0 0 NaN
53 2020-06-28 WI 30707.0 7977.0 239.0 3393.0 89.0 NaN 21953.0 A+ ... NaN NaN NaN 27743.0 0 0 0 0 0 NaN
54 2020-06-28 WV 2817.0 662.0 32.0 NaN 10.0 4.0 2062.0 B ... NaN NaN NaN 2723.0 0 0 0 0 0 NaN

50 rows × 25 columns

all_cases.head(50)
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
0 2020-06-28 Alaska AK 734002 883.0 348.0 12.0 NaN NaN 1.0 ... NaN NaN 0 0 0 0 0 NaN 2.2 1614.8044
1 2020-06-28 Alabama AL 4908621 35441.0 15656.0 655.0 2703.0 NaN NaN ... NaN 34964.0 0 0 0 0 0 NaN 3.1 15216.7251
2 2020-06-28 Arkansas AR 3038999 19310.0 5781.0 278.0 1373.0 NaN 63.0 ... NaN 19310.0 0 0 0 0 0 NaN 3.2 9724.7968
3 2020-06-28 Arizona AZ 7378494 73908.0 63394.0 2691.0 4617.0 666.0 475.0 ... NaN 73497.0 0 0 0 0 0 NaN 1.9 14019.1386
4 2020-06-28 California CA 39937489 211243.0 NaN 5956.0 NaN 1602.0 NaN ... NaN 211243.0 0 0 0 0 0 NaN 1.8 71887.4802
5 2020-06-28 Colorado CO 5845526 32022.0 25906.0 235.0 5399.0 NaN NaN ... NaN 29194.0 0 0 0 0 0 NaN 1.9 11106.4994
6 2020-06-28 Connecticut CT 3563077 46303.0 33934.0 103.0 10268.0 NaN NaN ... NaN 44324.0 0 0 0 0 0 NaN 2.0 7126.1540
7 2020-06-28 District of Columbia DC 720687 10248.0 8499.0 126.0 NaN 34.0 27.0 ... NaN 10248.0 0 0 0 0 0 NaN 4.4 3171.0228
8 2020-06-28 Delaware DE 982895 11226.0 4054.0 78.0 NaN 14.0 NaN ... NaN 10162.0 0 0 0 0 0 NaN 2.2 2162.3690
9 2020-06-28 Florida FL 21992985 141075.0 NaN NaN 14540.0 NaN NaN ... 2070179.0 141075.0 0 0 0 0 0 NaN 2.6 57181.7610
10 2020-06-28 Georgia GA 10736059 77210.0 NaN 1236.0 10711.0 NaN NaN ... 736057.0 77210.0 0 0 0 0 0 NaN 2.4 25766.5416
11 2020-06-28 Hawaii HI 1412687 872.0 140.0 NaN 110.0 NaN NaN ... 87010.0 872.0 0 0 0 0 0 NaN 1.9 2684.1053
12 2020-06-28 Iowa IA 3179849 28489.0 10164.0 118.0 NaN 36.0 18.0 ... NaN 28489.0 0 0 0 0 0 NaN 3.0 9539.5470
13 2020-06-28 Idaho ID 1826156 5319.0 1330.0 NaN 312.0 NaN NaN ... NaN 4790.0 0 0 0 0 0 NaN 1.9 3469.6964
14 2020-06-28 Illinois IL 12659682 142776.0 NaN 1464.0 NaN 373.0 193.0 ... NaN 141723.0 0 0 0 0 0 NaN 2.5 31649.2050
15 2020-06-28 Indiana IN 6745354 44930.0 8376.0 617.0 7003.0 266.0 86.0 ... NaN 44930.0 0 0 0 0 0 NaN 2.7 18212.4558
16 2020-06-28 Kansas KS 2910357 13538.0 12495.0 NaN 1128.0 NaN NaN ... NaN 13538.0 0 0 0 0 0 NaN 3.3 9604.1781
17 2020-06-28 Kentucky KY 4499692 15232.0 10944.0 386.0 2590.0 68.0 NaN ... NaN 14732.0 0 0 0 0 0 NaN 3.2 14399.0144
18 2020-06-28 Louisiana LA 4645184 56236.0 13245.0 715.0 NaN NaN 76.0 ... NaN 56236.0 0 0 0 0 0 NaN 3.3 15329.1072
19 2020-06-28 Massachusetts MA 6976597 108667.0 NaN 748.0 11319.0 134.0 81.0 ... NaN 103539.0 0 0 0 0 0 NaN 2.3 16046.1731
20 2020-06-28 Maryland MD 6083116 66777.0 58633.0 446.0 10793.0 158.0 NaN ... NaN 66777.0 0 0 0 0 0 NaN 1.9 11557.9204
21 2020-06-28 Maine ME 1345790 3191.0 510.0 31.0 346.0 10.0 4.0 ... 89123.0 2838.0 0 0 0 0 0 NaN 2.5 3364.4750
22 2020-06-28 Michigan MI 10045029 69946.0 12689.0 557.0 NaN 193.0 106.0 ... 946733.0 63261.0 0 0 0 0 0 NaN 2.5 25112.5725
23 2020-06-28 Minnesota MN 5700671 35549.0 3280.0 288.0 4010.0 143.0 NaN ... NaN 35549.0 0 0 0 0 0 NaN 2.5 14251.6775
24 2020-06-28 Missouri MO 6169270 20575.0 NaN 412.0 NaN NaN NaN ... 399926.0 20575.0 0 0 0 0 0 NaN 3.1 19124.7370
25 2020-06-28 Mississippi MS 2989260 25892.0 7611.0 676.0 3102.0 149.0 88.0 ... NaN 25724.0 0 0 0 0 0 NaN 4.0 11957.0400
26 2020-06-28 Montana MT 1086759 863.0 237.0 11.0 97.0 NaN NaN ... NaN 863.0 0 0 0 0 0 NaN 3.3 3586.3047
27 2020-06-28 North Carolina NC 10611862 62142.0 23899.0 890.0 NaN NaN NaN ... NaN 62142.0 0 0 0 0 0 NaN 2.1 22284.9102
28 2020-06-28 North Dakota ND 761723 3495.0 268.0 24.0 226.0 NaN NaN ... NaN 3495.0 0 0 0 0 0 NaN 4.3 3275.4089
29 2020-06-28 Nebraska NE 1952570 18775.0 5455.0 123.0 1315.0 NaN NaN ... NaN 18775.0 0 0 0 0 0 NaN 3.6 7029.2520
30 2020-06-28 New Hampshire NH 1371246 5717.0 949.0 35.0 562.0 NaN NaN ... NaN 5717.0 0 0 0 0 0 NaN 2.1 2879.6166
31 2020-06-28 New Jersey NJ 8936574 171182.0 126115.0 1014.0 19841.0 223.0 187.0 ... NaN 171182.0 0 0 0 0 0 NaN 2.4 21447.7776
32 2020-06-28 New Mexico NM 2096640 11619.0 5877.0 122.0 1851.0 NaN NaN ... NaN 11619.0 0 0 0 0 0 NaN 1.8 3773.9520
33 2020-06-28 Nevada NV 3139658 17160.0 15976.0 511.0 NaN 122.0 59.0 ... NaN 17160.0 0 0 0 0 0 NaN 2.1 6593.2818
34 2020-06-28 New York NY 19440469 392539.0 297694.0 869.0 89995.0 229.0 167.0 ... NaN 392539.0 0 0 0 0 0 NaN 2.7 52489.2663
35 2020-06-28 Ohio OH 11747694 50309.0 NaN 661.0 7681.0 182.0 101.0 ... NaN 46790.0 0 0 0 0 0 NaN 2.8 32893.5432
36 2020-06-28 Oklahoma OK 3954821 12994.0 3212.0 329.0 1456.0 134.0 NaN ... 313021.0 12642.0 0 0 0 0 0 NaN 2.8 11073.4988
37 2020-06-28 Oregon OR 4301089 8341.0 5490.0 149.0 1022.0 53.0 35.0 ... 223317.0 7521.0 0 0 0 0 0 NaN 1.6 6881.7424
38 2020-06-28 Pennsylvania PA 12820878 85496.0 12231.0 648.0 NaN NaN 121.0 ... NaN 81956.0 0 0 0 0 0 NaN 2.9 37180.5462
39 2020-06-28 Rhode Island RI 1056161 16661.0 14134.0 91.0 1984.0 16.0 15.0 ... NaN 16661.0 0 0 0 0 0 NaN 2.1 2217.9381
40 2020-06-28 South Carolina SC 5210095 33320.0 19148.0 954.0 2622.0 NaN NaN ... 317085.0 33221.0 0 0 0 0 0 NaN 2.4 12504.2280
41 2020-06-28 South Dakota SD 903027 6681.0 838.0 75.0 652.0 NaN NaN ... NaN 6681.0 0 0 0 0 0 NaN 4.8 4334.5296
42 2020-06-28 Tennessee TN 6897576 40172.0 13429.0 484.0 2564.0 NaN NaN ... 701761.0 39848.0 0 0 0 0 0 NaN 2.9 20002.9704
43 2020-06-28 Texas TX 29472295 148728.0 66361.0 5497.0 NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 2.3 67786.2785
44 2020-06-28 Utah UT 3282115 21100.0 9002.0 289.0 1396.0 83.0 NaN ... NaN 21100.0 0 0 0 0 0 NaN 1.8 5907.8070
45 2020-06-28 Virginia VA 8626207 61736.0 51999.0 818.0 8823.0 235.0 107.0 ... NaN 59071.0 0 0 0 0 0 NaN 2.1 18115.0347
46 2020-06-28 Vermont VT 628061 1202.0 200.0 15.0 NaN NaN NaN ... NaN 1202.0 0 0 0 0 0 NaN 2.1 1318.9281
47 2020-06-28 Washington WA 7797095 31404.0 NaN 304.0 4240.0 NaN 58.0 ... NaN 31404.0 0 0 0 0 0 NaN 1.7 13255.0615
48 2020-06-28 Wisconsin WI 5851754 30707.0 7977.0 239.0 3393.0 89.0 NaN ... NaN 27743.0 0 0 0 0 0 NaN 2.1 12288.6834
49 2020-06-28 West Virginia WV 1778070 2817.0 662.0 32.0 NaN 10.0 4.0 ... NaN 2723.0 0 0 0 0 0 NaN 3.8 6756.6660

50 rows × 29 columns

  • Load and clean JHU data
  • Merge JHU dataset with main dataset
LastUpdate ProvinceState Active Confirmed Deaths Recovered
5145 2020-06-19 Alaska 695.0 707.0 12.0 0.0
5146 2020-06-19 Arizona 42162.0 43445.0 1283.0 0.0
5147 2020-06-19 Arkansas 13720.0 13928.0 208.0 0.0
5148 2020-06-19 California 161731.0 167086.0 5355.0 0.0
5149 2020-06-19 Colorado 28248.0 29886.0 1638.0 0.0
5150 2020-06-19 Connecticut 41214.0 45440.0 4226.0 0.0
5151 2020-06-19 Delaware 10068.0 10499.0 431.0 0.0
5152 2020-06-19 District of Columbia 9376.0 9903.0 527.0 0.0
5153 2020-06-19 Florida 82865.0 85926.0 3061.0 0.0
5154 2020-06-19 Georgia 58307.0 60912.0 2605.0 0.0
5155 2020-06-19 Hawaii 745.0 762.0 17.0 0.0
5156 2020-06-19 Idaho 3654.0 3743.0 89.0 0.0
5157 2020-06-19 Illinois 128241.0 134778.0 6537.0 0.0
5158 2020-06-19 Indiana 38947.0 41438.0 2491.0 0.0
5159 2020-06-19 Iowa 24181.0 24861.0 680.0 0.0
5160 2020-06-19 Kansas 11502.0 11753.0 251.0 0.0
5161 2020-06-19 Kentucky 12677.0 13197.0 520.0 0.0
5162 2020-06-19 Louisiana 45572.0 48634.0 3062.0 0.0
5163 2020-06-19 Maine 2776.0 2878.0 102.0 0.0
5164 2020-06-19 Maryland 60213.0 63229.0 3016.0 0.0
5165 2020-06-19 Massachusetts 98653.0 106422.0 7769.0 0.0
5166 2020-06-19 Michigan 60737.0 66798.0 6061.0 0.0
5167 2020-06-19 Minnesota 30299.0 31675.0 1376.0 0.0
5168 2020-06-19 Mississippi 19703.0 20641.0 938.0 0.0
5169 2020-06-19 Missouri 16426.0 17371.0 945.0 0.0
5170 2020-06-19 Montana 635.0 655.0 20.0 0.0
5171 2020-06-19 Nebraska 17175.0 17414.0 239.0 0.0
5172 2020-06-19 Nevada 11694.0 12169.0 475.0 0.0
5173 2020-06-19 New Hampshire 5119.0 5450.0 331.0 0.0
5174 2020-06-19 New Jersey 155238.0 168107.0 12869.0 0.0
5175 2020-06-19 New Mexico 9697.0 10153.0 456.0 0.0
5176 2020-06-19 New York 354786.0 385760.0 30974.0 0.0
5177 2020-06-19 North Carolina 46972.0 48168.0 1196.0 0.0
5178 2020-06-19 North Dakota 3118.0 3193.0 75.0 0.0
5179 2020-06-19 Ohio 40489.0 43122.0 2633.0 0.0
5180 2020-06-19 Oklahoma 8989.0 9355.0 366.0 0.0
5181 2020-06-19 Oregon 6179.0 6366.0 187.0 0.0
5182 2020-06-19 Pennsylvania 78322.0 84683.0 6361.0 0.0
5183 2020-06-19 Rhode Island 15384.0 16269.0 885.0 0.0
5184 2020-06-19 South Carolina 20912.0 21533.0 621.0 0.0
5185 2020-06-19 South Dakota 6031.0 6109.0 78.0 0.0
5186 2020-06-19 Tennessee 32262.0 32770.0 508.0 0.0
5187 2020-06-19 Texas 99130.0 101259.0 2129.0 0.0
5188 2020-06-19 Utah 15687.0 15839.0 152.0 0.0
5189 2020-06-19 Vermont 1079.0 1135.0 56.0 0.0
5190 2020-06-19 Virginia 54652.0 56238.0 1586.0 0.0
5191 2020-06-19 Washington 25947.0 27192.0 1245.0 0.0
5192 2020-06-19 West Virginia 2330.0 2418.0 88.0 0.0
5193 2020-06-19 Wisconsin 23157.0 23876.0 719.0 0.0
5194 2020-06-19 Wyoming 1126.0 1144.0 18.0 0.0
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
5927 2020-01-26 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5928 2020-01-25 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5929 2020-01-24 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5930 2020-01-23 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5931 2020-01-22 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615

5 rows × 29 columns

An Exploratory data analysis of the US dataset

Basic triad of the dataset: validating data types and data integrity of each row

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5932 entries, 0 to 5931
Data columns (total 29 columns):
date                      5932 non-null datetime64[ns]
state                     5932 non-null object
abbrev                    5932 non-null object
population                5932 non-null int64
positive                  5932 non-null float64
active                    5932 non-null float64
hospitalizedCurrently     3645 non-null float64
hospitalizedCumulative    3234 non-null float64
inIcuCurrently            1883 non-null float64
onVentilatorCurrently     1675 non-null float64
recovered                 5932 non-null float64
dataQualityGrade          4998 non-null object
lastUpdateEt              5577 non-null object
dateModified              5577 non-null object
checkTimeEt               5577 non-null object
death                     5932 non-null float64
hospitalized              3234 non-null float64
totalTestsViral           1592 non-null float64
positiveTestsViral        535 non-null float64
negativeTestsViral        535 non-null float64
positiveCasesViral        3108 non-null float64
commercialScore           5932 non-null int64
negativeRegularScore      5932 non-null int64
negativeScore             5932 non-null int64
positiveScore             5932 non-null int64
score                     5932 non-null int64
grade                     0 non-null float64
bedsPerThousand           5932 non-null float64
total_beds                5932 non-null float64
dtypes: datetime64[ns](1), float64(16), int64(6), object(6)
memory usage: 1.4+ MB
covid_df.head(50)
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
0 2020-06-28 Alaska AK 734002 883.000 348.000 12.000 nan nan 1.000 ... nan nan 0 0 0 0 0 nan 2.200 1614.804
1 2020-06-28 Alabama AL 4908621 35441.000 15656.000 655.000 2703.000 nan nan ... nan 34964.000 0 0 0 0 0 nan 3.100 15216.725
2 2020-06-28 Arkansas AR 3038999 19310.000 5781.000 278.000 1373.000 nan 63.000 ... nan 19310.000 0 0 0 0 0 nan 3.200 9724.797
3 2020-06-28 Arizona AZ 7378494 73908.000 63394.000 2691.000 4617.000 666.000 475.000 ... nan 73497.000 0 0 0 0 0 nan 1.900 14019.139
4 2020-06-28 California CA 39937489 211243.000 205338.000 5956.000 nan 1602.000 nan ... nan 211243.000 0 0 0 0 0 nan 1.800 71887.480
5 2020-06-28 Colorado CO 5845526 32022.000 25906.000 235.000 5399.000 nan nan ... nan 29194.000 0 0 0 0 0 nan 1.900 11106.499
6 2020-06-28 Connecticut CT 3563077 46303.000 33934.000 103.000 10268.000 nan nan ... nan 44324.000 0 0 0 0 0 nan 2.000 7126.154
7 2020-06-28 District of Columbia DC 720687 10248.000 8499.000 126.000 nan 34.000 27.000 ... nan 10248.000 0 0 0 0 0 nan 4.400 3171.023
8 2020-06-28 Delaware DE 982895 11226.000 4054.000 78.000 nan 14.000 nan ... nan 10162.000 0 0 0 0 0 nan 2.200 2162.369
9 2020-06-28 Florida FL 21992985 141075.000 137557.000 nan 14540.000 nan nan ... 2070179.000 141075.000 0 0 0 0 0 nan 2.600 57181.761
10 2020-06-28 Georgia GA 10736059 77210.000 74432.000 1236.000 10711.000 nan nan ... 736057.000 77210.000 0 0 0 0 0 nan 2.400 25766.542
11 2020-06-28 Hawaii HI 1412687 872.000 140.000 nan 110.000 nan nan ... 87010.000 872.000 0 0 0 0 0 nan 1.900 2684.105
12 2020-06-28 Iowa IA 3179849 28489.000 10164.000 118.000 nan 36.000 18.000 ... nan 28489.000 0 0 0 0 0 nan 3.000 9539.547
13 2020-06-28 Idaho ID 1826156 5319.000 1330.000 nan 312.000 nan nan ... nan 4790.000 0 0 0 0 0 nan 1.900 3469.696
14 2020-06-28 Illinois IL 12659682 142776.000 135687.000 1464.000 nan 373.000 193.000 ... nan 141723.000 0 0 0 0 0 nan 2.500 31649.205
15 2020-06-28 Indiana IN 6745354 44930.000 8376.000 617.000 7003.000 266.000 86.000 ... nan 44930.000 0 0 0 0 0 nan 2.700 18212.456
16 2020-06-28 Kansas KS 2910357 13538.000 12495.000 nan 1128.000 nan nan ... nan 13538.000 0 0 0 0 0 nan 3.300 9604.178
17 2020-06-28 Kentucky KY 4499692 15232.000 10944.000 386.000 2590.000 68.000 nan ... nan 14732.000 0 0 0 0 0 nan 3.200 14399.014
18 2020-06-28 Louisiana LA 4645184 56236.000 13245.000 715.000 nan nan 76.000 ... nan 56236.000 0 0 0 0 0 nan 3.300 15329.107
19 2020-06-28 Massachusetts MA 6976597 108667.000 100607.000 748.000 11319.000 134.000 81.000 ... nan 103539.000 0 0 0 0 0 nan 2.300 16046.173
20 2020-06-28 Maryland MD 6083116 66777.000 58633.000 446.000 10793.000 158.000 nan ... nan 66777.000 0 0 0 0 0 nan 1.900 11557.920
21 2020-06-28 Maine ME 1345790 3191.000 510.000 31.000 346.000 10.000 4.000 ... 89123.000 2838.000 0 0 0 0 0 nan 2.500 3364.475
22 2020-06-28 Michigan MI 10045029 69946.000 12689.000 557.000 nan 193.000 106.000 ... 946733.000 63261.000 0 0 0 0 0 nan 2.500 25112.572
23 2020-06-28 Minnesota MN 5700671 35549.000 3280.000 288.000 4010.000 143.000 nan ... nan 35549.000 0 0 0 0 0 nan 2.500 14251.678
24 2020-06-28 Missouri MO 6169270 20575.000 19578.000 412.000 nan nan nan ... 399926.000 20575.000 0 0 0 0 0 nan 3.100 19124.737
25 2020-06-28 Mississippi MS 2989260 25892.000 7611.000 676.000 3102.000 149.000 88.000 ... nan 25724.000 0 0 0 0 0 nan 4.000 11957.040
26 2020-06-28 Montana MT 1086759 863.000 237.000 11.000 97.000 nan nan ... nan 863.000 0 0 0 0 0 nan 3.300 3586.305
27 2020-06-28 North Carolina NC 10611862 62142.000 23899.000 890.000 nan nan nan ... nan 62142.000 0 0 0 0 0 nan 2.100 22284.910
28 2020-06-28 North Dakota ND 761723 3495.000 268.000 24.000 226.000 nan nan ... nan 3495.000 0 0 0 0 0 nan 4.300 3275.409
29 2020-06-28 Nebraska NE 1952570 18775.000 5455.000 123.000 1315.000 nan nan ... nan 18775.000 0 0 0 0 0 nan 3.600 7029.252
30 2020-06-28 New Hampshire NH 1371246 5717.000 949.000 35.000 562.000 nan nan ... nan 5717.000 0 0 0 0 0 nan 2.100 2879.617
31 2020-06-28 New Jersey NJ 8936574 171182.000 126115.000 1014.000 19841.000 223.000 187.000 ... nan 171182.000 0 0 0 0 0 nan 2.400 21447.778
32 2020-06-28 New Mexico NM 2096640 11619.000 5877.000 122.000 1851.000 nan nan ... nan 11619.000 0 0 0 0 0 nan 1.800 3773.952
33 2020-06-28 Nevada NV 3139658 17160.000 15976.000 511.000 nan 122.000 59.000 ... nan 17160.000 0 0 0 0 0 nan 2.100 6593.282
34 2020-06-28 New York NY 19440469 392539.000 297694.000 869.000 89995.000 229.000 167.000 ... nan 392539.000 0 0 0 0 0 nan 2.700 52489.266
35 2020-06-28 Ohio OH 11747694 50309.000 47502.000 661.000 7681.000 182.000 101.000 ... nan 46790.000 0 0 0 0 0 nan 2.800 32893.543
36 2020-06-28 Oklahoma OK 3954821 12994.000 3212.000 329.000 1456.000 134.000 nan ... 313021.000 12642.000 0 0 0 0 0 nan 2.800 11073.499
37 2020-06-28 Oregon OR 4301089 8341.000 5490.000 149.000 1022.000 53.000 35.000 ... 223317.000 7521.000 0 0 0 0 0 nan 1.600 6881.742
38 2020-06-28 Pennsylvania PA 12820878 85496.000 12231.000 648.000 nan nan 121.000 ... nan 81956.000 0 0 0 0 0 nan 2.900 37180.546
39 2020-06-28 Rhode Island RI 1056161 16661.000 14134.000 91.000 1984.000 16.000 15.000 ... nan 16661.000 0 0 0 0 0 nan 2.100 2217.938
40 2020-06-28 South Carolina SC 5210095 33320.000 19148.000 954.000 2622.000 nan nan ... 317085.000 33221.000 0 0 0 0 0 nan 2.400 12504.228
41 2020-06-28 South Dakota SD 903027 6681.000 838.000 75.000 652.000 nan nan ... nan 6681.000 0 0 0 0 0 nan 4.800 4334.530
42 2020-06-28 Tennessee TN 6897576 40172.000 13429.000 484.000 2564.000 nan nan ... 701761.000 39848.000 0 0 0 0 0 nan 2.900 20002.970
43 2020-06-28 Texas TX 29472295 148728.000 66361.000 5497.000 nan nan nan ... nan nan 0 0 0 0 0 nan 2.300 67786.278
44 2020-06-28 Utah UT 3282115 21100.000 9002.000 289.000 1396.000 83.000 nan ... nan 21100.000 0 0 0 0 0 nan 1.800 5907.807
45 2020-06-28 Virginia VA 8626207 61736.000 51999.000 818.000 8823.000 235.000 107.000 ... nan 59071.000 0 0 0 0 0 nan 2.100 18115.035
46 2020-06-28 Vermont VT 628061 1202.000 200.000 15.000 nan nan nan ... nan 1202.000 0 0 0 0 0 nan 2.100 1318.928
47 2020-06-28 Washington WA 7797095 31404.000 30094.000 304.000 4240.000 nan 58.000 ... nan 31404.000 0 0 0 0 0 nan 1.700 13255.061
48 2020-06-28 Wisconsin WI 5851754 30707.000 7977.000 239.000 3393.000 89.000 nan ... nan 27743.000 0 0 0 0 0 nan 2.100 12288.683
49 2020-06-28 West Virginia WV 1778070 2817.000 662.000 32.000 nan 10.000 4.000 ... nan 2723.000 0 0 0 0 0 nan 3.800 6756.666

50 rows × 29 columns

The NaN values may indicate that there were no to few Covid-19 patients at these date points. We further analyse the statistical values of the dataset columns to ensure data integrity and accuracy.

population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently recovered death hospitalized ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
count 5932.000 5932.000 5932.000 3645.000 3234.000 1883.000 1675.000 5932.000 5932.000 3234.000 ... 535.000 3108.000 5932.000 5932.000 5932.000 5932.000 5932.000 0.000 5932.000 5932.000
mean 6542964.221 21163.607 18746.564 1023.799 4369.803 441.040 224.801 4474.004 1101.755 4369.803 ... 293835.318 32231.603 0.000 0.000 0.000 0.000 0.000 nan 2.626 15806.395
std 7387050.444 46807.026 42033.172 1927.101 12949.481 692.449 328.899 11042.022 2921.727 12949.481 ... 389283.058 56691.350 0.000 0.000 0.000 0.000 0.000 nan 0.744 16159.661
min 567025.000 0.000 0.000 1.000 0.000 2.000 0.000 0.000 0.000 0.000 ... 17.000 0.000 0.000 0.000 0.000 0.000 0.000 nan 1.600 1318.928
25% 1778070.000 640.000 555.000 121.000 223.000 82.000 35.500 0.000 13.000 223.000 ... 50018.000 5033.000 0.000 0.000 0.000 0.000 0.000 nan 2.100 3773.952
50% 4499692.000 5122.000 4543.000 402.000 973.000 181.000 94.000 218.000 147.000 973.000 ... 140972.000 13770.500 0.000 0.000 0.000 0.000 0.000 nan 2.500 11557.920
75% 7797095.000 20840.750 17541.250 1032.000 3255.250 482.000 249.000 3140.500 782.250 3255.250 ... 360303.000 35463.250 0.000 0.000 0.000 0.000 0.000 nan 3.100 19124.737
max 39937489.000 392539.000 356899.000 18825.000 89995.000 5225.000 2425.000 79974.000 24835.000 89995.000 ... 2070179.000 392539.000 0.000 0.000 0.000 0.000 0.000 nan 4.800 71887.480

8 rows × 22 columns

final_100k_last_month.head()
positive_100k active_100k recovered_100k death_100k hospitalizedCumulative_100k inIcuCurrently_100k onVentilatorCurrently_100k BedsPer100k
date
2020-04-19 nan nan nan nan nan 153.528 80.717 13440.000
2020-04-20 413.759 391.692 35.481 25.728 22.652 156.581 79.710 13440.000
2020-04-21 387.394 360.446 65.218 30.520 31.446 166.081 78.603 13440.000
2020-04-22 428.601 989.954 412.625 28.780 36.181 167.561 78.032 13440.000
2020-04-23 452.031 -2213.482 72.921 26.282 28.842 166.277 94.521 13440.000
final_100k_last_month.describe()
positive_100k active_100k recovered_100k death_100k hospitalizedCumulative_100k inIcuCurrently_100k onVentilatorCurrently_100k BedsPer100k
count 61.000 61.000 61.000 61.000 61.000 62.000 62.000 62.000
mean 358.759 336.008 170.212 17.931 34.329 113.658 62.620 13440.000
std 65.620 442.921 105.723 7.283 42.821 26.916 13.514 0.000
min 245.203 -2213.482 35.481 4.880 -93.926 70.613 39.353 13440.000
25% 308.315 292.339 107.989 12.184 21.638 94.079 53.461 13440.000
50% 344.558 332.717 147.227 17.253 25.122 111.563 62.120 13440.000
75% 405.031 370.778 211.312 23.811 29.823 126.991 74.683 13440.000
max 544.349 2291.210 626.665 33.917 246.371 167.561 94.521 13440.000

Graphical Exploratory Analysis

Plotting histograms, scatterplots and boxplots to assess the distribution of the entire US dataset.

timeseries_usa_df.tail()
date positive_100k active_100k recovered_100k death_100k hospitalizedCurrently_100k inIcuCurrently_100k onVentilatorCurrently_100k BedsPer100k
154 2020-06-24 33315.285 19401.954 12359.391 1553.940 408.570 68.612 36.820 13440.000
155 2020-06-25 33812.912 19730.969 12498.864 1583.079 414.087 67.864 36.962 13440.000
156 2020-06-26 34335.924 20098.997 12643.998 1592.929 404.115 67.051 34.318 13440.000
157 2020-06-27 34829.638 20417.559 12812.241 1599.839 407.257 68.533 35.118 13440.000
158 2020-06-28 35334.565 20809.528 12921.408 1603.630 402.011 65.968 33.930 13440.000

Analysis of Hospitalizations by State

New York:

C:\Users\Doctor Gomez\AppData\Roaming\Python\Python37\site-packages\pandas\plotting\_converter.py:129: FutureWarning:

Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()

Text(0, 0.5, 'No. Patients')

Alabama:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Arizona:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Positive Cases in Hospital')
Text(0, 0.5, 'No. Patients')

Arkansas:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

California:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Colorado:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')

Connecticut:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')

Delaware:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Florida:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
# TODO fix legend/axis/plot alltogether
# Timeseries plt
fig, ax = plt.subplots(figsize = (16, 12))
plt.plot(fl.date, fl.positiveTestsViral, linewidth=4.7, color='r')
plt.title('Cummulative Number of Positive Viral Tests in Florida', fontsize=23)
plt.xlabel('Date')
plt.ylabel('No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Infected')

Georgia:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Infection Rate')

South Carolina:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Infection Rate')

Texas:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Nevada:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Mississippi:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Utah:

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Oklahoma:

Text(0, 0.5, 'No. Patients')

Assessing Correlation of Independent Variables

<matplotlib.axes._subplots.AxesSubplot at 0x229f277c708>

Build model for dependent Variable

  • To be used to predict current hospitalizations
  • Having more complete variables for in ICU currently and on Ventilator Currently will allow us to predict these numbers as well.
population positive active hospitalizedCurrently inIcuCurrently onVentilatorCurrently recovered death totalTestsViral positiveTestsViral negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade total_beds
count 3608.000 3608.000 3608.000 3608.000 1833.000 1627.000 3608.000 3608.000 1121.000 399.000 399.000 2531.000 3608.000 3608.000 3608.000 3608.000 3608.000 0.000 3608.000
mean 6734094.920 31851.205 28006.326 1020.191 437.771 221.840 7009.118 1716.808 401896.592 25414.065 243476.774 36576.426 0.000 0.000 0.000 0.000 0.000 nan 16006.400
std 7738225.857 56776.272 51040.791 1930.862 698.186 331.324 13459.030 3599.085 542128.442 26144.869 232041.192 61272.107 0.000 0.000 0.000 0.000 0.000 nan 16508.042
min 567025.000 115.000 113.000 1.000 2.000 0.000 0.000 0.000 9055.000 407.000 8648.000 396.000 0.000 0.000 0.000 0.000 0.000 nan 1318.928
25% 1778070.000 3276.500 2835.000 117.000 80.000 34.000 9.000 91.000 87459.000 4128.000 63478.000 6439.500 0.000 0.000 0.000 0.000 0.000 nan 3773.952
50% 4645184.000 12335.500 10087.500 399.500 179.000 91.000 1297.500 474.500 223245.000 14135.000 168871.000 16441.000 0.000 0.000 0.000 0.000 0.000 nan 11557.920
75% 8626207.000 35334.500 29969.500 1014.750 469.000 238.000 6266.500 1598.250 491884.000 44340.500 310173.500 40708.000 0.000 0.000 0.000 0.000 0.000 nan 19124.737
max 39937489.000 392539.000 356899.000 18825.000 5225.000 2425.000 79974.000 24835.000 3955952.000 87087.000 946733.000 392539.000 0.000 0.000 0.000 0.000 0.000 nan 71887.480